CONTENTS |
Perl does some very useful things and provides such huge resources in the CPAN library (http://cpan.org) that it will clearly be with us for a long time yet as a way of writing scripts to run behind Apache. While Perl is powerful, CGI is not a particularly efficient means of connecting Perl to Apache. CGI's big disadvantage is that each time a script is invoked, Apache has to load the Perl interpreter and then it has to load the script. This is a heavy and pointless overhead on a busy site, and it would obviously be much easier if Perl stayed loaded in memory, together with the scripts, to be invoked each time they were needed. This is what mod_perl does by modifying Apache.
This modification is definitely popular: according to Netcraft surveys in mid-2000, mod_perl was the third most popular add-on to Apache (after FrontPage and PHP), serving more than a million URLs on over 120,000 different IP numbers (http://perl.apache.org/outstanding/stats/netcraft.html).
The reason that this chapter is more than a couple of pages long is that Perl does not sit easily in a web server. It was originally designed as a better shell script to run standalone under Unix. It developed, over time, into a full-blown programming language. However, because the original Perl was not designed for this kind of work, various things have to happen. To illustrate them, we will start with a simple Perl script that runs under Apache's mod_cgi and then modify it to run under mod_perl. (We assume that the reader is familiar enough with Perl to write a simple script, understands the ideas of Perl modules, use( ), require( ), and the BEGIN and END pragmas.)
On site.mod_perl we have two subdirectories: mod_cgi and mod_perl. In mod_cgi we present a simple script-driven site that runs a home page that has a link to another page.
The Config file is as follows:
User webuser Group webuser ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.mod_perl/mod_cgi/htdocs TransferLog /usr/www/APACHE3/APACHE3/site.mod_perl/mod_cgi/logs/access_log LogLevel debug ScriptAlias /bin /usr/www/APACHE3/APACHE3/site.mod_perl/cgi-bin ScriptAliasMatch /AA(.*) /usr/www/APACHE3/APACHE3/site.mod_perl/cgi-bin/AA$1 DirectoryIndex /bin/home.pl
When you go to http://www.butterthlies.com, you see the results of running the Perl script home:
#! /usr/local/bin/perl -w use strict; print qq(content-type: text/html\n\n <HTML><HEAD><TITLE>Demo CGI Home Page</TITLE></HEAD> <BODY>Hi: I'm a demo home page <A HREF="/AA_next">Click here to run my mate</A> </BODY></HTML>);
On the browser, this simply says:
Hi: I'm a demo home page. Click here to run my mate
And when you do, you get:
Hi: I'm a demo next page
Which is printed by the script AA_next:
#! /usr/local/bin/perl -w use strict; print qq(content-type: text/html\n\n <HTML><HEAD><TITLE>NEXT Page</TITLE></HEAD> <BODY>Hi: I'm a demo next page </BODY></HTML>);
Naturally, this is a web site that will run and run and make everyone concerned into e-billionaires. In the process of serving the millions of visitors it will attract, Perl will get loaded and unloaded millions of times, which helps to explain why they are running out of electricity in Silicon Valley. We have to stop this reckless waste of the world's resources, so we install mod_perl.
The principle of mod_perl is simple enough: Perl is loaded into Apache when it starts up — which makes for very big Apache child processes. This saves the time that would be spent loading and unloading the Perl interpreter but calls for a lot more RAM.
If you use Apache::PerlRun, you get a half-way environment where Perl is kept in memory but scripts are loaded each time they are run. Most CGI scripts will work right away in this environment.
If you go whole hog and use Apache::Registry, your scripts will be loaded at startup too, thus saving the overhead of loading and unloading them. If your scripts use a database manager, you can also keep an open connection to the DBM, and so save time there as well (see later). Good as this for execution speed, there is a drawback, in that your scripts now all run as subroutines below a hidden main program. The problem with this, and it can be a killer if you get it wrong, is that global variables are initialized only when Apache starts up. More of this follows.
The problems of mod_perl — which are not that serious — almost all stem from the fact that all your separate scripts now run as a single script in a rather odd environment.
However, because Apache and Perl are now rather intimately blended, there is a corresponding fuzziness about the interface between them. Rather surprisingly, we can now include Perl scripts in the Apache Config file, though we will not go to such extreme lengths here.
Since things are more complicated, there are more things to go wrong and greater need for careful testing. The error_log is going to be your best friend. Make sure that correct line numbers are enabled when you compile mod_perl, and you may want to use Carp at runtime to get fuller error messages.
Before doing anything, it would be sensible to cast a glance at the documentation: what are we getting? What can we do with it? What are the pitfalls?
In line with the maturity (or bloat) of the Apache project, there is a stunning amount of this material at http://perl.apache.org/#docs. We started off by downloading The mod_perl Guide by Stas Bekman at http://perl.apache.org/guide. There must be more than 500 pages, many of which are applicable only to very specialized situations. Obviously we cannot transcribe or usefully compress this amount of material into a few pages here. Be aware that it exists and if you have problems, look there first and thoroughly: you may very well find an answer.
We assume, to begin with, that you are running on some sort of Unix machine, you have downloaded the Apache sources, built Apache, and that now you are going to add mod_perl.
The first thing to do is to get the mod_perl sources. Go to http://apache.org. In the list of links to the left of the screen you should see "mod_perl": select it. This takes you to http://perl.apache.org, the home page of the Apache/Perl Integration Project.
The first step is to select "Download," which then offers you a number of ways of getting to the executables. The simplest is to download from http://perl.apache.org/dist (linked as this site), but there are many alternatives. When we did it, the gzipped tar on offer was mod_perl-1.24.tar.gz — no doubt the numbers will have moved on by the time this is in print. This gives you about 600 KB of file that you get onto your Unix machine as best you can.
It is worth saving it in a directory near your Apache, because this slightly simplifies the business of building and installing it later on. We keep all this stuff in /usr/src/mod_perl, near where the Apache sources were already stored. We created a directory for mod_perl, moved the downloaded file into it, unzipped it with gunzip <filename>, and extracted the files with tar xvf <filename> so we have: /usr/src/apache/mod_perl/mod_perl-1.24, and not very far away: /usr/src/apache/apache_1.3.26.
Go into /usr/src/apache/mod_perl/mod_perl-1.24, and read INSTALL. The simple way of installing the package offers no surprises:
perl Makefile.PL make make test make install
For some reason, we found we had to repeat the whole process two or three times before it all went smoothly without error messages. So if you get obscure complaints, go back to the top and try again before beginning to scream.
Some clever things happen, culminating in a recompile of Apache. This works because the mod_perl makefile looks for the most recent Apache source in a neighboring directory. If you want to take this route, make sure that the right version is in the right place. If the installation process cannot find an Apache source directory, it will ask you where to look. This process generates a new httpd in /usr/src/apache/apache_1.3.26/src, which needs to be copied to wherever you keep your executables — in our case, /usr/local/bin.
To make experimentation easier, you might not want to overwrite the old, non-mod_perl httpd, so save the new one as httpd.perl. The change of size is striking: up from 480 KB to 1.2 MB. Luckily, we will only have to load it once when Apache starts up.
In The mod_perl Guide, Bekman gives five different recipes for installing mod_perl.
The first is a variant on the method we gave earlier, with the difference that various makefile parameters allow you to control the operation more precisely:
perl Makefile.PL APACHE_SRC=../../apache_x.x.x/src DO_HTTPD=1 EVERYTHING=1
The xs represent numbers that describe your source for Apache. DO_HTTPD=1 creates a new Apache executable, and EVERYTHING=1 turns all the other parameters on. For a complete list and their applications, see the documentation. This seems to have much the same effect as simply running:
perl Makefile.PL
If you want to use the one-step, predigested method of creating APACHE using the APACI, you can do that with this:
perl Makefile.PL APACHE_SRC=../../apache_x.x.x/src DO_HTTPD=1 \ EVERYTHING=1 USE_APACI=1
Note that you must use \ to continue lines.
Two more recipes concern DSOs (Dynamic Shared Objects), that is, executables that Apache can load when needed and unload when not. We don't suggest that you use these for serious business, firstly because we are not keen on DSOs, and secondly because mod_perl is not a module you want to load and unload. If you use it at all, you are very likely to need it all the time.
So far so good, but in real life you may very well want to link more than one module into your Apache. The idea here is to set up all the modules in the Apache source tree before building it.
Download both source files into the appropriate places on your machine. Go into the mod_perl directory, and prepare the src/modules/perl subdirectory in the Apache source tree with the following:
perl Makefile.PL APACHE_SRC=../../apache_x.x.x/src \ NO_HTTPD=1 \ USE_APACI=1 \ PREP_HTTPD=1 \ EVERYTHING=1 \ make make test make install
The PREP_HTTPD option forces the preparation of the Apache Perl tree, but no build yet.
Having prepared mod_perl, you can now also prepare other modules. Later on we will demonstrate this by including mod_PHP.
When everything is ready, build the new Apache by going into the.../src directory and typing:
./configure --activate-module=src/modules/perl/libperl.a [and similar for other modules] make
Having built mod_perl, you should then test the result with make test. This process does its own arcane stuff, skipping various tests that are inappropriate for your platform. Hopefully it ends with the cheerful message "All tests successful..." If it finds problems, it writes them to the file ...t/logs/error_log. You can now do make install on the Perl side — and again on the Apache side — and copy the new httpd, perhaps as httpd.perl to the directory where your executables live — as described earlier.
Wherever there is Perl, there are "gotchas" — the invisible traps that nullify your best efforts — and there are a few lurking here.
If you use DO_HTTPD=1 or NO_HTTPD and don't use APACHE_SRC, then the Apache build will take place in the first Apache directory found, rather than the one with the highest release number.
If you are using Apache::Registry scripts (see later), line numbers will be wrongly reported in the error_log file. To get the correct numbers — or at least, an approximation to them, use PERL_MARK_WHERE=1. It is hard to see why anyone would prefer wrong line numbers, but this is part of the richness of the world of Perl.
If you use backslashes to indicate line breaks in the argument list to Makefile.PL and you are running the tcsh shell, the backslashes will be stripped out, and all the parameters after the first backslash will be ignored.
If you put the mod_perl directory inside the Apache directory, everything will go horribly wrong.
If you escaped these gotchas, don't be afraid that you have missed the fun: there are more to come. Building software the first time is a challenge, and one makes the effort to get it right.
Building it again, perhaps months or even years later, usually happens after some other drama, like a dead hard disk or a move to a different machine. At this stage one often has other things to think about, and repeating the build from memory can often be painful. mod_perl offers a civilized way of storing the configuration by making Makefile.PL look for parameters in the file makepl_args.mod_perl — you can put your parameters there the first time around and just run perl Makefile.PL. However, any command-line parameters will override those in the file.
One can always achieve this effect with any perl script under Unix by running:
perl Makefile.PL `cat ~/.build_parameters`
cat and the backticks cause the contents of the file build parameters to be extracted and passed as arguments to Makefile.PL
Many scripts that will run under mod_cgi will run under mod_perl using Apache::PerlRun in the Config file. This in itself speeds things up because Perl does not have to reload for each call; scripts that have been tidied up or written especially will run even better under Apache::Registry.
You may want to experiment with different Config files and scripts. If you are running under Apache::Registry, you will have to restart Apache to reload the script.
The biggest single "gotcha" for scripts running under Apache::Registry is caused by global variables. The mod_cgi environment is rather kind to the slack programmer. Your scripts, which tend to be short and simple, get loaded, run, and then thrown away. Perl rather considerately initializes all variables to undef at startup, so one tends to forget about the dangers they represent.
Unhappily, under mod_perl and Apache::Registry, scripts effectively run as subroutines. Global variables get initialized at startup as usual, but not again, so if you don't explicitly initialize them at each call, they will carry forward whatever value they had after the last call. What makes these bugs more puzzling is that as the Apache child processes start, each one of them has its variables set to 0. The errant behavior will not begin to show until a child process is used a second time — and maybe not even then.
There are several lines of attack:
Do away with every global variable that isn't absolutely necessary
Make sure that every global variable that survives is initialized
Put your code into modules as subroutines and call it from the main script — for some reason global variables in the module will be initialized
To illustrate this tiresome behavior we created a new directory /usr/www/APACHE3/APACHE3/site.mod_perl/mod_perl and copied everything across into it from.../mod_cgi. The startup file go was now:
httpd.perl -d /usr/www/APACHE3/APACHE3/site.mod_perl/mod_perl
The Config file is as follows:
User webuser Group webuser ServerName www.butterthlies.com LogLevel debug DocumentRoot /usr/www/APACHE3/APACHE3/site.mod_perl/mod_cgi/htdocs TransferLog /usr/www/APACHE3/APACHE3/site.mod_perl/logs/access_log ErrorLog /usr/www/APACHE3/APACHE3/site.mod_perl/logs/error_log LogLevel debug #change to AliasMatch from ScriptAliasMatch AliasMatch /(.*) /usr/www/APACHE3/APACHE3/site.mod_perl/cgi-bin/$1 DirectoryIndex /bin/home Alias /bin /usr/www/APACHE3/APACHE3/site.mod_perl/cgi-bin SetHandler perl-script PerlHandler Apache::Registry #PerlHandler Apache::PerlRun
Notice that the convenient directives ScriptAlias and ScriptAliasMatch, which effectively encapsulate an Alias directive followed by SetHandler cgi-script for use under mod_cgi, are no longer available.
You have to declare an Alias, then that you are running perl-script, and then what flavor, or intensity of mod_perl you want.
The script home is now:
#! /usr/local/bin/perl -w use strict; print qq(content-type: text/html\n\n); my $global=0; for(1 .. 5) { &inc_g( ); } print qq(<HTML><HEAD><TITLE>Demo CGI Home Page</TITLE></HEAD> <BODY>Hi: I'm a demo home page. Global = $global<BR> <A HREF="/AA_next">Click here to run my mate</A> </BODY></HTML>); sub inc_g( ) { $global+=1; print qq(global = $global<BR>); }
If you fire up Apache and watch the output, you don't have to reload it many times (having turned off caching in your browser, of course) before you see the following unnerving display:
content-type: text/html global = 21 global = 22 global = 23 global = 24 global = 25 Hi: I'm a demo home page. Global = 0 Click here to run my mate
This unpleasant behavior is accompanied by the following message in the error_log file:
Variable "$global" will not stay shared at /usr/www/APACHE3/APACHE3/site.mod_perl/ cgi-bin/home
which should give you a pretty good warning that all is not well. If you start Apache up using the -X flag — to prevent child processes — then the bad behavior begins on the first reload.
It will not happen at all if you use the line:
PerlHandler Apache::PerlRun
because under PerlRun, although Perl itself stays loaded, your scripts are reloaded at each call — and, of course, all the variables are initialized. There is a performance penalty, of course.
When your scripts ran under mod_cgi, they started off with the "shebang line":
#! usr/local/bin/perl -w -T
Under mod_perl this is no longer necessary. However, it is tolerated, so you don't have to remove it, and the -w flag is even picked up and invokes warnings. It would be too simple if all the other possible flags were also recognized, so if you use -T to invoke taint checking, it won't work. You have to use PerlTaintCheck On, PerlWarning On in the Apache Config file. It is recommended that you always use PerlTaintCheck to guard against attempts to hack your scripts by way of dubious entries in HTML forms. It is recommended that you have PerlWarn on while the scripts are being developed, but when in production to turn warnings off since one warning per visitor, written to the log file on a busy site, can soon use up all the available disk space and bring the server to a halt.
It is extremely important to:
use strict;
under mod_perl, to detect unsafe Perl constructs.
Under mod_cgi and mod_perl Apache::PerlRun you simply have to edit a script and save it to start it working. Under mod_perl and Apache::Registry, the changes will not take effect until you restart Apache or reload your scripts. Stas Beckman (http://perl.apache.org/guide/config.html) gives some very elaborate ways of doing this, including a method of rewriting your Config file via an HTML form. We feel that although this sort of trick may amaze and delight your friends, it may please your enemies even more, who will find there new and exciting ways of penetrating your security. We see nothing wrong with restarting Apache with the script stop_go: it will give anyone who is logged on to your site a surprise:
kill -USR1 `cat logs\httpd.pid`
This reloads Perl, loads the scripts afresh, and reinitializes all variables.
Another consequence of scripts remaining permanently loaded is that opened files are not automatically closed when a script terminates — because it doesn't terminate until Apache is shut down. Failure to do this will eat up memory and file handles. It is important therefore that every opened file should be explicitly closed. However, it is not good enough just to use close( ) conscientiously because something may go wrong in the script, causing it to exit without executing the close( ) statement. The cure is to use the I/O module. This has the effect that the file handle is closed when the block it is in goes out of scope:
use IO; ... my $fh=IO::File->new("name") or die $!; $fh->print($text); #or $stuff=<$fh>; # $fh closes automatically
Alternatively:
use Symbol; ... My $fh=Symbol::gensym; Open $fh or die $!; .... #automatic close
Under Perl 5.6.0 this is enough:
open my $fh, $filename or die $!; ... # automatic close
Bearing all this in mind, we can now set up the Config file neatly. In line with convention, we rename .../cgi-bin to .../perl. We can then put most of the Perl stuff neatly in a <Location> block:
User webuser Group webuser ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.mod_perl/mod_cgi/htdocs TransferLog /usr/www/APACHE3/APACHE3/site.mod_perl/logs/access_log ErrorLog /usr/www/APACHE3/APACHE3/site.mod_perl/logs/error_log #change this before production! LogLevel debug AliasMatch /perl(.*) /usr/www/APACHE3/APACHE3/site.mod_perl/perl/$1 Alias /perl /usr/www/APACHE3/APACHE3/site.mod_perl/perl DirectoryIndex /perl/home PerlTaintCheck On PerlWarn On <Location /perl> SetHandler perl-script PerlHandler Apache::Registry #PerlHandler Apache::PerlRun Options ExecCGI PerlSendHeader On </Location>
Remember to reduce the Debug level before using this in earnest! Note that the two directives:
PerlTaintCheck On PerlWarn On
won't go into the <Location> block because they are executed when Perl loads.
A quick web site is well on the way to being a good web site. It is probably worth taking a little trouble to speed up your scripts; but bear in mind that most elapsed time on the Web is spent by clients looking at their browser screens, trying to work out what they're about.
We discuss the larger problems of speeding up whole sites in Chapter 12. Here we offer a few tips on making scripts run faster in less space. The faster they run, the more clients you can serve in sequence; the less space they run in, the more copies you can run and the more clients you can serve simultaneously. However, if your site attracts so many people it is still bogging down, you can surely afford to throw more hardware at it. If you can't, why are you bothering?
Users of FreeBSD might like to look at http://www.freebsd.org/cgi/man.cgi?query=tuning for some basic suggestions
The search for perfect optimization can get into subtle and time-consuming byways that are very dependent on the details of how your scripts work. A good reason not to spend too much time on optimizing your code is that the small change you make tomorrow to fix a maintenance problem will probably throw the hard-won optimizations all out of whack.
The whole point of using mod_perl is to get more business out of your server. Just installing it and configuring it as show earlier will help, but there is more you can do.
When mod_perl starts, it has to load the modules used by your scripts:
... use strict; use DBI( ); use CGI; ...
In the normal way of Perl, as modules are called by scripts, they are compiled — Perl scans them for errors and puts them into executable format. This process is faster if it is done at startup and particularly affects the big CGI module. It can be done in advance by including the compile command:
... use strict; use DBI( ); use CGI; CGI->compile(<tags>); ...
You would replace <tags> by a list of the CGI subroutines you actually use.
If you use a database, your scripts will be constantly opening and closing access handles. This process wastes time and can be improved by Apache::DBI.
It is worth turning off KeepAlive (see Chapter 3) on busy sites because it keeps the server connected to each client for a minimum time even if they are doing nothing. This consumes processes, which consumes memory. Because each connection corresponds to a process, and each process has a whole instance of Perl and all the cached compiled code and persistent variables, this can be a great deal of memory — far more than you get with more ordinary Apache usage. Likewise, tuning MaxClients to avoid swapping can improve the performance even though, paradoxically, it actually causes people to have to wait.
The classic tool for making programs run faster is the profiler. It counts clock ticks as each line of code is executed by the processor. The total count for each line shows the time it took. The output is a log file that can be sorted by a presentation package to show up the lines that take most time to execute. Very often problems are revealed that you can't do much about: processing has to be done, and it just takes time. However, occasionally the profiler shows you that the problem is caused by some subroutine being called unnecessarily often. You cut it out of the loop or reorganize the loop to work more efficiently, and your script leaps satisfyingly forward.
A Perl profiler, DProf, is available from CPAN (see http://search.cpan.org).There are two ways of using it (see the documentation). The better way is to put the following line in your Config file:
... PerlModule Apache::DProf ...
This pulls in the profiler and creates a directory below <ServerRoot> called dprof/$$. In there you will find a file called tmon.out, which contains the results. You can study it by running the script dprofpp, which comes with the package.
Interesting as the results of a profiler are, it is not worth spending too much effort on them. If a part of the code accounts for 50% of the execution time (which is most unlikely), getting rid of it altogether will only double the speed of execution. Much more likely that a part of the code accounts for 10% of the time — and getting rid of it (supposing you can) will speed up execution by 10% — which no one will notice.
CONTENTS |